Skip to content

feat: GovSignals v4.4.4 hardening (replaces #2, #3)#11

Closed
ConProgramming wants to merge 39 commits into
mainfrom
gs-v4.4.4
Closed

feat: GovSignals v4.4.4 hardening (replaces #2, #3)#11
ConProgramming wants to merge 39 commits into
mainfrom
gs-v4.4.4

Conversation

@ConProgramming
Copy link
Copy Markdown

Summary

This is the replacement for the stale #2 (DIRTY/CONFLICTING, 87 files of WIP + accidental artifacts) and #3. It carries six clean GovSignals commits on top of upstream v4.4.4.

Commits

  1. feat(cli): add two-phase deploy (build-only + register-only) — replaces Add two-phase deploy flags #2. CLI flags --build-only, --register-only, --registry, --repository, --base-image-node, --containerfile-module, --skip-digest. Reads/writes .triggerdeploy.json. Removes the in-container CliApiClient call from the indexer.
  2. feat(webapp): add DEPLOY_IMAGE_OVERRIDE env var for custom image references — replaces feat: Add DEPLOY_IMAGE_OVERRIDE for custom image references #3. Bypasses auto-generation of image tags when set.
  3. chore: add govsignals CVE-remediation pnpm overrides — pnpm.overrides for ~50 CVE-flagged transitive deps so pnpm fetch --frozen-lockfile succeeds during hardened Docker builds.
  4. fix(core): inline @trigger.dev/database types to fix pruned monorepo build — works around the pruned monorepo build failure for @trigger.dev/core.
  5. feat(supervisor): parameterize worker-pod annotations via KUBERNETES_WORKER_POD_ANNOTATIONS — NEW. Replaces the previously hardcoded com.palantir.rubix.service/pod-cert annotation with a JSON-shaped env var. FedStart sets the Rubix value; GameWarden leaves the default ({}).
  6. feat(helm): add supervisor.extraVolumes/supervisor.extraVolumeMounts — NEW. Mirrors the existing webapp.extraVolumes/webapp.extraVolumeMounts pattern. Required for mounting CA bundles into the supervisor pod for compliance environments. Will be contributed back upstream as a separate PR.

Why a fresh branch

PR #2 head was based on v4.0.0-v4-beta.24 (~9 months stale) and contained accidental artifacts: a vendored CLI tarball, a references/v3-catalog/src copy/ dev backup, a captured .triggerdeploy.json with project ref + image SHA + dirty-tree flag, a hardcoded developer path in buildImage.ts, and a deleted upstream package.json shim. This PR is rebased onto upstream v4.4.4 and contains only the production-ready content.

Diff vs v4.4.4

27 files changed, 2863 insertions(+), 3225 deletions(-)

Net negative because pnpm-lock.yaml shrinks under the override consolidation.

Closes

Tag

After merge, the merge commit will be tagged gs-v4.4.4 for the GovSignals image builds to consume.

Made with Cursor

ericallam and others added 30 commits March 26, 2026 07:59
…ter (triggerdotdev#3273)

fix: filter dev environments by userId in OrganizationsPresenter
…t to ClickHouse (triggerdotdev#3274)

Adds three new top-level columns to the ClickHouse task_runs_v2 table
primarily for analytics:

- `trigger_source` / `root_trigger_source` - extracted from the existing
TaskRun.annotations JSON during WAL
replication
- `is_warm_start` - new nullable boolean on TaskRun in Postgres, set in
the existing taskRun.update() at attempt
start (no additional write). null until the first attempt starts.

Run region is already available via the existing `worker_queue` column
in ClickHouse.
…titles (triggerdotdev#3276)

Only makes the span title text brighter when it's shown in the right
hand side inspector (in the spans list it stays dimmed)
…iggerdotdev#3254)

For human reviewer:

- Check if Redis connection + code makes sense
- Check CLI methods (it's on a hotpath)
- Check DB Migrations and new tables


## ✅ Checklist

- [x] I have followed every step in the [contributing
guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md)
- [x] The PR title follows the convention.
- [x] I ran and tested the code works

---

## Testing

Spawning new CLI / Dashboard notifications, check MVP, check if failures
not produce any problems with CLI/Dashboard

---

## Changelog

Added notifications mechanism for Dashboard and CLI

---

## Screenshots



💯
Adds the `ComputeWorkloadManager` for routing task execution through the
compute gateway, including full checkpoint/restore support, OTel trace
integration, and template pre-warming.

## Changes

**Compute workload manager**
(`apps/supervisor/src/workloadManager/compute.ts`)
- Routes instance create, snapshot, delete, and restore through the
compute gateway API
- Wide event logging on create with full timing and context
- Configurable gateway timeout, auth token, image digest stripping

**Compute snapshot service**
(`apps/supervisor/src/services/computeSnapshotService.ts`)
- Timer wheel for delayed snapshot dispatch (avoids wasted work on
short-lived waitpoints)
- Configurable dispatch concurrency limit
(`COMPUTE_SNAPSHOT_DISPATCH_LIMIT`)
- Snapshot-complete callback handler with suspend completion reporting
- Trace context management and OTel span emission for snapshot
operations

**OTel trace service**
(`apps/supervisor/src/services/otlpTraceService.ts`)
- Fire-and-forget OTLP span emission for compute operations (provision,
restore, snapshot)
- BigInt nanosecond conversion preserving sub-ms precision for span
ordering

**Template creation**
(`apps/webapp/app/v3/services/computeTemplateCreation.server.ts`)
- Three-mode rollout: required (MICROVM projects), shadow (feature flag
/ percentage), skip
- Integrated into deploy finalize flow

**Shared compute package** (`internal-packages/compute/`)
- Gateway client with namespace-based API (instances, templates,
snapshots)
- Zod schemas for all gateway request/response types

**Database**
- `COMPUTE` variant added to `TaskRunCheckpointType` enum
- `WorkloadType` enum and column on `WorkerInstanceGroup`
- `hasComputeAccess` feature flag

**Env / config**
- Compute gateway URL, auth token, timeout
- Snapshot enable flag, delay, dispatch limit
- Dedicated OTLP endpoint for compute spans
(`COMPUTE_TRACE_OTLP_ENDPOINT`)
…v#3297)

Adds support for taint tolerations for scheduled runs. Useful for
selectively tolerating taints on dedicated node pools.

The new `KUBERNETES_SCHEDULED_RUN_TOLERATIONS` env variable accepts a
comma-separated list in the format key=value:effect (or key:effect for
the Exists operator).

Drive-by: renames all `KUBERNETES_SCHEDULE_*` affinity env vars to
KUBERNETES_SCHEDULED_RUN_* for clarity — this feature isn't used in
production yet or published in a tagged image; the name change is fine.
Adds a dialog to the admin orgs page for viewing and editing per-org
feature flag overrides. Flags are introspected from the catalog so the
UI stays in sync with available flags automatically. Also adds a new tab
for global flags.

Refactors featureFlags.server.ts to split catalog definition (shared)
from server-only runtime (flags(), makeSetMultipleFlags). The shared
module exports flag metadata and validation so both the UI and API
routes can use it without pulling in server dependencies.
…v#3281)

- Rebuild llm_pricing_tiers and llm_prices in syncLlmCatalog for
source=default
- Add vitest config, sync regression tests, and pin vitest 3.1.4
- Update pnpm-lock.yaml for the new devDependency
Add TTL (time-to-live) defaults at task-level and config-level, with
precedence: per-trigger > task > config > dev default (10m).

Docs PR: triggerdotdev#3200 (merge after packages are released)
…ons (triggerdotdev#3299)

## Summary

Currently, every triggered run follows a two-step path through Redis:

1. **Enqueue** — A Lua script atomically adds the message to a queue
sorted set (ordered by priority-adjusted timestamp)
2. **Dequeue** — A debounced `processQueueForWorkerQueue` job fires
~500ms later, checks concurrency limits, removes the message from the
sorted set, and pushes it to a worker queue (Redis list) where workers
pick it up via `BLPOP`

This means every run pays at least ~500ms of latency between being
triggered and being available for a worker to execute, even when the
queue is empty and concurrency is wide open.

### What changed

The enqueue Lua scripts now atomically decide whether to **skip the
queue sorted set entirely** and push directly to the worker queue. This
happens inside the same Lua script that handles normal enqueue, so the
decision is atomic with respect to concurrency bookkeeping.

A run takes the **fast path** when all of these are true:
- **Fast path is enabled** for this worker queue (gated per
`WorkerInstanceGroup`)
- **No available messages** in the queue (`ZRANGEBYSCORE` finds nothing
with score ≤ now) — this respects priority ordering and allows fast path
even when the queue has future-scored messages (e.g. nacked retries with
delay)
- **Environment concurrency** has capacity
- **Queue concurrency** has capacity (including per-concurrency-key
limits for CK queues)

When the fast path is taken:
- The message is stored and pushed directly to the worker queue
(`RPUSH`)
- Concurrency slots are claimed (`SADD` to the same sets used by the
normal dequeue path)
- The `processQueueForWorkerQueue` job is **not scheduled** (no work to
do)
- TTL sorted set is skipped (the `expireRun` worker job handles TTL
independently)

When any condition fails, the existing slow path runs unchanged.

### Rollout gating

- **Development environments**: Fast path is always enabled
- **Production environments**: Gated by a new `enableFastPath` boolean
on `WorkerInstanceGroup` (defaults to `false`), allowing
region-by-region rollout

### Rolling deploy safety

Each process registers its own Lua scripts via `defineCommand`
(identified by SHA hash). Old and new processes never share scripts. The
Redis data structures are fully compatible in both directions — ack,
nack, and release operations work identically regardless of which path a
message took.

## Test plan

- [x] Fast path taken when queue is empty and concurrency available
- [x] Slow path when `enableFastPath` is false
- [x] Slow path when queue has available messages (respects priority
ordering)
- [x] Fast path when queue only has future-scored messages
- [x] Slow path when env concurrency is full
- [x] Fast-path message can be acknowledged correctly
- [x] Fast-path message can be nacked and re-enqueued to the queue
sorted set
- [x] Run all existing run-queue tests (ack, nack, CK, concurrency
sweeper, dequeue) to verify no regressions
- [x] Typecheck passes for run-engine and webapp
…otdev#3302)

Temporary workaround that enables filtering by environment in the
envvars page, without changing any UI.

---------

Co-authored-by: Claude <noreply@anthropic.com>
The @internal/compute package had its main/types pointing to
./src/index.ts with no build step. This works in dev (tsc resolves .ts
at compile time) but fails at runtime in Docker because Node.js can't
load .ts files directly.

Added tsconfig.build.json and build/clean/dev scripts matching the
pattern used by schedule-engine and other internal packages. Exports now
point to dist/.
- Added versions filtering on the Errors list and page
- Added errors stacked bars to the graph on the individual error page

---------

Co-authored-by: James Ritchie <james@trigger.dev>
Adds a migration reference for users moving from n8n to Trigger.dev.
Includes a concept map, four common patterns covering the
migration-specific gaps, and a full customer onboarding example. The
onboarding workflow highlights the 3-day wait pattern, an area where
n8n's execution model has known reliability issues at production scale
that Trigger.dev handles natively
This allows seamless migration to different object storage.

Existing runs that have offloaded payloads/outputs will continue to use
the default object store (configured using `OBJECT_STORE_*` env vars).

You can add additional stores by setting new env vars:
- `OBJECT_STORE_DEFAULT_PROTOCOL` this determines where new run large
payloads will get stored.
- If you set that you need to set new env vars for that protocol.
  
Example:

```
OBJECT_STORE_DEFAULT_PROTOCOL=“s3"
OBJECT_STORE_S3_BASE_URL=https://s3.us-east-1.amazonaws.com
OBJECT_STORE_S3_ACCESS_KEY_ID=<val>
OBJECT_STORE_S3_SECRET_ACCESS_KEY=<val>
OBJECT_STORE_S3_REGION=us-east-1
OBJECT_STORE_S3_SERVICE=s3
```

---------

Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com>
Lots of layout and UI improvements to the new Models page

<img width="2282" height="1356" alt="CleanShot 2026-04-01 at 14 14 35"
src="https://github.com/user-attachments/assets/4a13a291-5d5b-415f-962f-b9cfb25b887d"
/>
)

## Summary

- Drop all 8 foreign key constraints on TaskRun. The run listing path is
now fully ClickHouse-backed so we no longer need Postgres to enforce
referential integrity on this table. The FK constraints add write
overhead on every insert/update with no remaining benefit. Prisma
queries are unaffected.
- Remove PostgresRunsRepository and its associated feature flag
(runsListRepository), which was the last remaining code path querying
TaskRun directly for list/count operations.
- Drop three indexes that were only useful for the Postgres run list
path and have no remaining query consumers:
- TaskRun_runtimeEnvironmentId_id_idx — was the cursor pagination index
for PostgresRunsRepository; superseded by the (runtimeEnvironmentId,
createdAt DESC) composite index
- TaskRun_scheduleId_idx — redundant with the (scheduleId, createdAt
DESC) composite index; no direct Postgres queries filter by scheduleId
alone
- TaskRun_rootTaskRunId_idx — no queries filter TaskRun by rootTaskRunId
as a WHERE clause anywhere in the codebase

All index drops use CONCURRENTLY IF EXISTS to avoid table locks in
production.

## Test plan

  - pnpm run db:migrate:deploy applies all migrations cleanly
  - pnpm run typecheck --filter webapp passes
  - Run list pages load correctly in the dashboard (ClickHouse path)
  - Scheduled task runs still trigger and appear correctly
…gerdotdev#3318)

Simple responsive breakpoint changes to the 3 onboarding screens + the
login screen to make it mobile friendlier

<img width="523" height="872" alt="CleanShot 2026-04-02 at 17 18 55"
src="https://github.com/user-attachments/assets/815d19ca-df9b-4b3e-8f6d-e00c81628679"
/>
This is a small improvement mainly with the UI Skills file:

- Animate open and close the Resizable panels
- Uses the built in animation hooks from react-window-splitter
- Includes a global variable for the animation easing and timing for
consistency


https://github.com/user-attachments/assets/50ed0019-ed12-4e08-b95c-7c6d1fe5bac0
### Text wrapping fix

- Fixes message text not wrapping on the run inspector if there were no
spaces in the text
- Fixes inspector title truncation
- Adds a copy text button for the Message property

<img width="468" height="740" alt="CleanShot 2026-04-04 at 10 19 02@2x"
src="https://github.com/user-attachments/assets/71e42bf3-d103-44a2-b3b4-937c0b60a4bc"
/>
…ssor (triggerdotdev#3331)

A single "fetch failed" from the object store was aborting the entire
batch stream with no retry. Added p-retry (3 attempts, 500ms-2s backoff)
around ploadPacketToObjectStore so transient network errors self-heal
server-side instead of propagating to the SDK.
…ev#3348)

Sets `application_name` on the Prisma writer and replica connection
strings using the existing `SERVICE_NAME` env var, so DB load can be
attributed by service.
nicktrn and others added 9 commits April 13, 2026 11:24
…triggerdotdev#3366)

Adds region-level gating so MICROVM regions are only visible and usable
by orgs with the `hasComputeAccess` feature flag. Admins and explicit
allowlist behavior unchanged.

- New shared helper (`regionAccess.server.ts`) with
`resolveComputeAccess`, `defaultVisibilityFilter`, and
`isComputeRegionAccessible`
- `RegionsPresenter` filters out MICROVM regions for non-compute orgs
- `SetDefaultRegionService` blocks setting a MICROVM region as default
without compute access
- `WorkerGroupService` blocks triggering runs in MICROVM regions without
compute access
- `computeTemplateCreation` refactored to use shared
`resolveComputeAccess`
- Updated snapshot callback schema
…ev#3324)

- bugfix to show the changelog to the target audience
- more functionality for admins, to edit, delete and archive
notifications
## Summary
12 new features, 59 improvements, 17 bug fixes.

## Highlights

- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([triggerdotdev#3196](triggerdotdev#3196))
- Large run outputs can use the new API which allows switching object
storage providers.
([triggerdotdev#3275](triggerdotdev#3275))

## Improvements
- Add platform notifications support to the CLI. The `trigger dev` and
`trigger login` commands now fetch and display platform notifications
(info, warn, error, success) from the server. Includes discovery-based
filtering to conditionally show notifications based on project file
patterns, color markup rendering for styled terminal output, and a
non-blocking display flow with a spinner fallback for slow fetches. Use
`--skip-platform-notifications` flag with `trigger dev` to disable the
notification check.
([triggerdotdev#3254](triggerdotdev#3254))
- Add `get_span_details` MCP tool for inspecting individual spans within
a run trace.
([triggerdotdev#3255](triggerdotdev#3255))
- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
- New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
- New `retrieveSpan()` method on the API client
- `get_query_schema` — discover available TRQL tables and columns
- `query` — execute TRQL queries against your data
- `list_dashboards` — list built-in dashboards and their widgets
- `run_dashboard_query` — execute a single dashboard widget query
- `whoami` — show current profile, user, and API URL
- `list_profiles` — list all configured CLI profiles
- `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
- `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs
- `GET /api/v1/query/schema` — query table schema discovery
- `GET /api/v1/query/dashboards` — list built-in dashboards
- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
- `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools
- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches
- Adapted the CLI API client to propagate the trigger source via http
headers.
([triggerdotdev#3241](triggerdotdev#3241))
- Propagate run tags to span attributes so they can be extracted
server-side for LLM cost attribution metadata.
([triggerdotdev#3213](triggerdotdev#3213))
- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
- New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
- New `retrieveSpan()` method on the API client
- `get_query_schema` — discover available TRQL tables and columns
- `query` — execute TRQL queries against your data
- `list_dashboards` — list built-in dashboards and their widgets
- `run_dashboard_query` — execute a single dashboard widget query
- `whoami` — show current profile, user, and API URL
- `list_profiles` — list all configured CLI profiles
- `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
- `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs
- `GET /api/v1/query/schema` — query table schema discovery
- `GET /api/v1/query/dashboards` — list built-in dashboards
- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
- `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools
- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches
- Add optional `hasPrivateLink` field to the dequeue message
organization object for private networking support
([triggerdotdev#3264](triggerdotdev#3264))
- Define and manage AI prompts with `prompts.define()`. Create typesafe
prompt templates with variables, resolve them at runtime, and manage
versions and overrides from the dashboard without redeploying.
([triggerdotdev#3244](triggerdotdev#3244))

## Bug fixes
- Fix dev CLI leaking build directories on rebuild, causing disk space
accumulation. Deprecated workers are now pruned (capped at 2 retained)
when no active runs reference them. The watchdog process also cleans up
`.trigger/tmp/` when the dev CLI is killed ungracefully (e.g. SIGKILL
from pnpm).
([triggerdotdev#3224](triggerdotdev#3224))
- Fix `--load` flag being silently ignored on local/self-hosted builds.
([triggerdotdev#3114](triggerdotdev#3114))
- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (triggerdotdev#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes
- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (triggerdotdev#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes

## Server changes

These changes affect the self-hosted Docker image and Trigger.dev Cloud:

- Add admin UI for viewing and editing feature flags (org-level
overrides and global defaults).
([triggerdotdev#3291](triggerdotdev#3291))
- AI prompt management dashboard and enhanced span inspectors.
  
  **Prompt management:**
- Prompts list page with version status, model, override indicators, and
24h usage sparklines
- Prompt detail page with template viewer, variable preview, version
history timeline, and override editor
- Create, edit, and remove overrides to change prompt content or model
without redeploying
  - Promote any code-deployed version to current
- Generations tab with infinite scroll, live polling, and inline span
inspector
- Per-prompt metrics: total generations, avg tokens, avg cost, latency,
with version-level breakdowns
  
  **AI span inspectors:**
- Custom inspectors for `ai.generateText`, `ai.streamText`,
`ai.generateObject`, `ai.streamObject` parent spans
- `ai.toolCall` inspector showing tool name, call ID, and input
arguments
  - `ai.embed` inspector showing model, provider, and input text
- Prompt tab on AI spans linking to prompt version with template and
input variables
  - Compact timestamp and duration header on all AI span inspectors
  
  **AI metrics dashboard:**
- Operations, Providers, and Prompts filters on the AI Metrics dashboard
  - Cost by prompt widget
  - "AI" section in the sidebar with Prompts and AI Metrics links
  
  **Other improvements:**
  - Resizable panel sizes now persist across page refreshes
- Fixed `<div>` inside `<p>` DOM nesting warnings in span titles and
chat messages
([triggerdotdev#3244](triggerdotdev#3244))
- Add allowRollbacks query param to the promote deployment API to enable
version downgrades
([triggerdotdev#3214](triggerdotdev#3214))
- Pre-warm compute templates on deploy for orgs with compute access.
Required for projects using a compute region, background-only for
others.
([triggerdotdev#3114](triggerdotdev#3114))
- Add automatic LLM cost calculation for spans with GenAI semantic
conventions. When a span arrives with `gen_ai.response.model` and token
usage data, costs are calculated from an in-memory pricing registry
backed by Postgres and dual-written to both span attributes
(`trigger.llm.*`) and a new `llm_metrics_v1` ClickHouse table that
captures usage, cost, performance (TTFC, tokens/sec), and behavioral
(finish reason, operation type) metrics.
([triggerdotdev#3213](triggerdotdev#3213))
- Add API endpoint `GET /api/v1/runs/:runId/spans/:spanId` that returns
detailed span information including properties, events, AI enrichment
(model, tokens, cost), and triggered child runs.
([triggerdotdev#3255](triggerdotdev#3255))
- Multi-provider object storage with protocol-based routing for
zero-downtime migration
([triggerdotdev#3275](triggerdotdev#3275))
- Add IAM role-based auth support for object stores (no access keys
required).
([triggerdotdev#3275](triggerdotdev#3275))
- Add platform notifications to inform users about new features,
changelogs, and platform events directly in the dashboard.
([triggerdotdev#3254](triggerdotdev#3254))
- Add private networking support via AWS PrivateLink. Includes
BillingClient methods for managing private connections, org settings UI
pages for connection management, and supervisor changes to apply
`privatelink` pod labels for CiliumNetworkPolicy matching.
([triggerdotdev#3264](triggerdotdev#3264))
- Reduce run start latency by skipping the intermediate queue when
concurrency is available. This optimization is rolled out per-region and
enabled automatically for development environments.
([triggerdotdev#3299](triggerdotdev#3299))
- Extended the search filter on the environment variables page to match
on environment type (production, staging, development, preview) and
branch name, not just variable name and value.
([triggerdotdev#3302](triggerdotdev#3302))
- Set `application_name` on Prisma connections from SERVICE_NAME so DB
load can be attributed by service
([triggerdotdev#3348](triggerdotdev#3348))
- Fix transient R2/object store upload failures during batchTrigger()
item streaming.
  
- Added p-retry (3 attempts, 500ms–2s exponential backoff) around
`uploadPacketToObjectStore` in `BatchPayloadProcessor.process()` so
transient network errors self-heal server-side rather than aborting the
entire batch stream.
- Removed `x-should-retry: false` from the 500 response on the batch
items route so the SDK's existing 5xx retry path can recover if
server-side retries are exhausted. Item deduplication by index makes
full-stream retries safe.
([triggerdotdev#3331](triggerdotdev#3331))
- Concurrency-keyed queues now use a single master queue entry per base
queue instead of one entry per key. Prevents high-CK-count tenants from
consuming the entire parentQueueLimit window and starving other tenants
on the same shard.
([triggerdotdev#3219](triggerdotdev#3219))
- Reduce lock contention when processing large `batchTriggerAndWait`
batches. Previously, each batch item acquired a Redis lock on the parent
run to insert a `TaskRunWaitpoint` row, causing
`LockAcquisitionTimeoutError` with high concurrency (880 errors/24h in
prod). Since `blockRunWithCreatedBatch` already transitions the parent
to `EXECUTING_WITH_WAITPOINTS` before items are processed, the per-item
lock is unnecessary. The new `blockRunWithWaitpointLockless` method
performs only the idempotent CTE insert without acquiring the lock.
([triggerdotdev#3232](triggerdotdev#3232))
- Strip `secure` query parameter from QUERY_CLICKHOUSE_URL before
passing to ClickHouse client. This was already done for the main and
logs ClickHouse clients but was missing for the query client, causing a
startup crash with `Error: Unknown URL parameters: secure`.
([triggerdotdev#3204](triggerdotdev#3204))
- Fix `OrganizationsPresenter.#getEnvironment` matching the wrong
development environment on teams with multiple members. All dev
environments share the slug `"dev"`, so the previous `find` by slug
alone could return another member's environment. Now filters DEVELOPMENT
environments by `orgMember.userId` to ensure the logged-in user's dev
environment is selected.
([triggerdotdev#3273](triggerdotdev#3273))

<details>
<summary>Raw changeset output</summary>

# Releases
## @trigger.dev/build@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## trigger.dev@4.4.4

### Patch Changes

- Add platform notifications support to the CLI. The `trigger dev` and
`trigger login` commands now fetch and display platform notifications
(info, warn, error, success) from the server. Includes discovery-based
filtering to conditionally show notifications based on project file
patterns, color markup rendering for styled terminal output, and a
non-blocking display flow with a spinner fallback for slow fetches. Use
`--skip-platform-notifications` flag with `trigger dev` to disable the
notification check.
([triggerdotdev#3254](triggerdotdev#3254))

- Fix dev CLI leaking build directories on rebuild, causing disk space
accumulation. Deprecated workers are now pruned (capped at 2 retained)
when no active runs reference them. The watchdog process also cleans up
`.trigger/tmp/` when the dev CLI is killed ungracefully (e.g. SIGKILL
from pnpm).
([triggerdotdev#3224](triggerdotdev#3224))

- Fix `--load` flag being silently ignored on local/self-hosted builds.
([triggerdotdev#3114](triggerdotdev#3114))

- Add `get_span_details` MCP tool for inspecting individual spans within
a run trace.
([triggerdotdev#3255](triggerdotdev#3255))

- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
    -   New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
    -   New `retrieveSpan()` method on the API client

- MCP server improvements: new tools, bug fixes, and new flags.
([triggerdotdev#3224](triggerdotdev#3224))

    **New tools:**

    -   `get_query_schema` — discover available TRQL tables and columns
    -   `query` — execute TRQL queries against your data
    -   `list_dashboards` — list built-in dashboards and their widgets
    -   `run_dashboard_query` — execute a single dashboard widget query
    -   `whoami` — show current profile, user, and API URL
    -   `list_profiles` — list all configured CLI profiles
    -   `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
    -   `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs

    **New API endpoints:**

    -   `GET /api/v1/query/schema` — query table schema discovery
    -   `GET /api/v1/query/dashboards` — list built-in dashboards

    **New features:**

- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
    -   `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools

    **Bug fixes:**

- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (triggerdotdev#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes

    **Context optimizations:**

- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches

- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([triggerdotdev#3196](triggerdotdev#3196))

- Adapted the CLI API client to propagate the trigger source via http
headers.
([triggerdotdev#3241](triggerdotdev#3241))

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`
    -   `@trigger.dev/build@4.4.4`
    -   `@trigger.dev/schema-to-json@4.4.4`

## @trigger.dev/core@4.4.4

### Patch Changes

- Fix `list_deploys` MCP tool failing when deployments have null
`runtime` or `runtimeVersion` fields.
([triggerdotdev#3224](triggerdotdev#3224))

- Propagate run tags to span attributes so they can be extracted
server-side for LLM cost attribution metadata.
([triggerdotdev#3213](triggerdotdev#3213))

- Add `get_span_details` MCP tool for inspecting individual spans within
a run trace.
([triggerdotdev#3255](triggerdotdev#3255))

- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
    -   New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
    -   New `retrieveSpan()` method on the API client

- MCP server improvements: new tools, bug fixes, and new flags.
([triggerdotdev#3224](triggerdotdev#3224))

    **New tools:**

    -   `get_query_schema` — discover available TRQL tables and columns
    -   `query` — execute TRQL queries against your data
    -   `list_dashboards` — list built-in dashboards and their widgets
    -   `run_dashboard_query` — execute a single dashboard widget query
    -   `whoami` — show current profile, user, and API URL
    -   `list_profiles` — list all configured CLI profiles
    -   `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
    -   `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs

    **New API endpoints:**

    -   `GET /api/v1/query/schema` — query table schema discovery
    -   `GET /api/v1/query/dashboards` — list built-in dashboards

    **New features:**

- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
    -   `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools

    **Bug fixes:**

- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (triggerdotdev#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes

    **Context optimizations:**

- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches

- Large run outputs can use the new API which allows switching object
storage providers.
([triggerdotdev#3275](triggerdotdev#3275))

- Add optional `hasPrivateLink` field to the dequeue message
organization object for private networking support
([triggerdotdev#3264](triggerdotdev#3264))

- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([triggerdotdev#3196](triggerdotdev#3196))

- Adapted the CLI API client to propagate the trigger source via http
headers.
([triggerdotdev#3241](triggerdotdev#3241))

## @trigger.dev/python@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/sdk@4.4.4`
    -   `@trigger.dev/core@4.4.4`
    -   `@trigger.dev/build@4.4.4`

## @trigger.dev/react-hooks@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/redis-worker@4.4.4

### Patch Changes

- Adapted the CLI API client to propagate the trigger source via http
headers.
([triggerdotdev#3241](triggerdotdev#3241))
-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/rsc@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/schema-to-json@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/sdk@4.4.4

### Patch Changes

- Define and manage AI prompts with `prompts.define()`. Create typesafe
prompt templates with variables, resolve them at runtime, and manage
versions and overrides from the dashboard without redeploying.
([triggerdotdev#3244](triggerdotdev#3244))
- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([triggerdotdev#3196](triggerdotdev#3196))
- Adapted the CLI API client to propagate the trigger source via http
headers.
([triggerdotdev#3241](triggerdotdev#3241))
-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

</details>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Adds support for splitting deployment into separate build and register
phases. New CLI options: --build-only, --register-only, --registry,
--repository, --base-image-node, --containerfile-module, --skip-digest.

Also adds: worker pod service account configuration, security context,
DEPLOY_VERSION_SUFFIX env var, custom containerfile module support,
and index metadata extraction from Docker images.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rences

When set, bypasses auto-generation of image tags and uses the provided
image reference directly for all deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-applies the pnpm-overrides.json from govsignals hardened Dockerfiles
into package.json so the lockfile is consistent and `pnpm fetch
--frozen-lockfile` succeeds during Docker builds.

Replaces upstream scoped overrides (form-data@^2, axios@1.9.0, etc.)
with blanket CVE-driven versions. Adds overrides for ws, tar, semver,
@babel/runtime, cookie, undici, zod, Remix packages, and others.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…build

The supervisor Docker build (turbo prune --scope=supervisor) fails because
@trigger.dev/database Prisma types don't resolve in the pruned context.
Replace type-only imports with inline string literal types derived from
the Prisma schema, decoupling core from database at build time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…WORKER_POD_ANNOTATIONS

Replaces the hardcoded Rubix `com.palantir.rubix.service/pod-cert` annotation
with a JSON-shaped env var so the same supervisor image can run in environments
that need different per-pod annotations (Rubix in FedStart, none/empty in
GameWarden).

Default is `{}` so behavior matches upstream when the env is unset.
Mirrors the existing webapp.extraVolumes / webapp.extraVolumeMounts pattern.
Required for compliance environments that need to mount a custom CA bundle
ConfigMap into the supervisor pod (e.g. Palantir Rubix or GameWarden CAs)
and point NODE_EXTRA_CA_CERTS at it.

Both extras render unconditionally — they no longer depend on the legacy
bootstrap-disabled volume block — so any consumer can opt in.
@github-actions
Copy link
Copy Markdown

Thanks for your contribution! We require all external PRs to be opened in draft status first so you can address CodeRabbit review comments and ensure CI passes before requesting a review. Please re-open this PR as a draft. See CONTRIBUTING.md for details.

@github-actions github-actions Bot closed this Apr 29, 2026
@ConProgramming
Copy link
Copy Markdown
Author

Re-opening as a clean draft PR rebased onto current main (which has been synced past v4.4.4). The 6 GovSignals commits are unchanged in intent; package.json overrides were merged to keep upstream's range-targeted CVE entries plus our unique additions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants